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Abstract 


Twenty-five years ago, the two main pillars of quantum chemistry—density functional 
and composite ab initio theories—were recognized with a Nobel Prize in Chemistry 
awarded to Walter Kohn and John Pople. This recognition sparked intense theoretical 
developments in both fields. Whereas in 1998, the year the Nobel Prize was 
awarded, there were only a handful of composite ab initio methods; most notably 
the Gaussian-n methods (n= 1-3), CBS methods (e.g, CBS-OCI and CBS-APNO), 
and the focal-point analysis approach, today there are many more families of 
such methods, including the Weizmann-n, MCCM, HEAT, ccCA, FPD, ATOMIC, 
INT-MP2-F12, and ChS family of methods, where some of these families include dozens 
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of variants. Overall, there are over 100 contemporary variants of composite ab initio 
methods to choose from, with many variants implemented as a keyword in popular 
quantum chemical packages. This situation makes it difficult to choose a proper method 
for a given chemical system, property, and desired accuracy. This chapter provides 
an overview of contemporary composite ab initio methods applicable to first- and 
second-row elements, their main energetic components, and their expected accuracy 
and applicability. To guide the selection of a suitable method for a given chemical 
system and desired accuracy, the various methods are classified according to a 
‘Jacob's Ladder’ of composite ab initio methods, from computationally economical 
methods that are capable of approaching chemical accuracy to computationally 
demanding methods capable of confident sub-benchmark accuracy. 


1. Introduction 


Shortly after the Schrodinger equation was developed, arguably the 
greatest physicist of the 20th century, Paul Dirac, stated that quantum 
mechanics could be used to understand “the whole of chemistry.” (1) 
Such a bold statement would have seemed like a far-reaching dream at a time 
when the new theory was only applicable to one- and two-electron systems 
like the helium atom and Hz molecule (2). Indeed, Dirac quickly hedged 
this statement by adding “the difficulty is only that the exact application 
of these laws leads to equations much too complicated to be soluble.” 
This statement is still true today; however, it was Dirac’s next sentence that 
shaped the development of quantum chemistry over the past century “It 
therefore becomes desirable that approximate practical methods of applying 
quantum mechanics should be developed, which can lead to an explanation 
of the main features of complex atomic systems”. In this insightful quote, 
which was written well before the age of computers, let alone supercom- 
puters, Dirac captures quite succinctly two key aspects of contemporary 
computational quantum chemistry (i) the development of computationally 
economical quantum chemical theories and (ii) the application of these 
theories for exploring and predicting the electronic structure of molecules 
and materials. For most of the 20th century, quantum chemical theories 
were still applicable to fairly small systems; however, owing to significant 
developments in quantum chemical theory and supercomputer technology 
over the past three decades, Dirac’s dream has now been realized for 
complex chemical systems across the Periodic Table (3—5). A major 
stepping-stone in realizing this goal was the development of quantum 
chemical theories that are both highly accurate (i.e., capable of thermo- 
chemical and kinetic predictions with confident sub-kcal/mol accuracy) 
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and applicable to medium-sized chemical systems with dozens of atoms. 
These quantum chemical theories are commonly referred to as composite 
ab initio methods (sometimes also called ab initio thermochemistry methods or com- 
pound thermochemistry methods). These procedures use a series of high-level 
ab initio calculations to obtain accurate thermochemical and kinetic data 
comparable to experimental data. This field began with the development 
of the so-called Gaussian-1 (G1) theory by Pople and co-workers in the late 
1980s (6, 7). Subsequently, major advances in quantum chemical theory and 
high-performance supercomputer technology have allowed this field to 
flourish into a widely applied subfield of quantum chemistry. Over the 
past 30-odd years, a wide range of composite ab initio methods has been 
developed. Popular examples include the Gaussian-n (Gn) methods 
(6—10) and variants thereof (11—18), complete basis set (CBS) model chem- 
istries (19-25), focal-point analysis (FPA) (26-30), Weizmann-n (Wn) 
(31-36), WnX (37-39), multi-coefficient correlation methods (MCCMs) 
(40-45), high-accuracy extrapolated ab initio thermochemistry (HEAT) 
(46-50), correlation consistent composite approach (ccCA) (51-60), 
Feller—Peterson—Dixon (FPD) (61-67), ab initio thermochemistry using 
optimal-balance models with isodesmic corrections (ATOMIC) (68-70), 
interference-corrected explicitly correlated second-order perturbation the- 
ory (INT-MP2-F12) (71), and the so-called cheap composite scheme (ChS) 
(72,73) procedures. Today, composite ab initio methods are among the 
most accurate means for examining chemical processes at the atomic level. 
These theories allow quantum mechanics not only to explain the chemistry 
of complex molecules but also to obtain chemical properties (such as 
reaction energies and barrier heights) with accuracy that rivals or exceeds 
the most accurate experiments. Thus, these methods are instrumental in 
(i) investigating transient species that are difficult to study experimentally 
(e.g., free radicals, transition structures, and short-lived reaction intermedi- 
ates), (ii) modeling kinetics and mechanisms of challenging chemical 
reactions, and (iii) analyzing and explaining observed experimental trends. 
This chapter gives an overview of the various types of composite 
ab initio methods that have been developed over the past 30-odd years with 
an emphasis on key theoretical developments, design philosophy, confident 
sub-chemical accuracy, and strategies for choosing an appropriate method 
for a given chemical problem and desired level of accuracy. The present 
review focuses on main-group first- and second-row chemistry, for addi- 
tional in-depth overviews of composite ab initio methods, the reader is 
referred to several excellent reviews that have been published over the past 
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two decades (61,62,74-91). (For comprehensive reviews of coupled-cluster 
theory see references (92—95), and references therein.) 


1.1 Chemical accuracy and theoretical uncertainty 


A primary goal of composite ab initio methods is to obtain accurate 
thermochemical and kinetic properties, if possible, with well-defined error 
bars. It is helpful to begin by discussing two levels of accuracy that are 
extensively used in computational thermochemistry, namely “chemical 
accuracy” and “benchmark accuracy.” Chemical accuracy refers to an 
energy unit of 1.0 kcal mol ~t = 4.184 kJ mol’. It is convenient to define 
this unit of energy since it represents about 1% of the bond dissociation 
energy (BDE) of typical covalent bonds. Incidentally, ~1 kcal mol”! also 
represents a typical error bar for experimental tabulations in thermochemical 
databases such as the National Institute of Standards and Technology (NIST) 
(96), which were instrumental in calibrating and benchmarking the 
first-generation composite ab initio methods. 

It should be noted that the more recently developed Active 
Thermochemical Tables (ATcT) approach developed by Ruscic and 
co-workers is capable of producing thermochemical data with markedly 
lower error bars (97,98). The development of ATcT played a key role in 
the development of high-level composite ab initio methods capable of 
sub-benchmark accuracy, vide infra. This illustrates that the development 
of more accurate next-generation theoretical procedures is often limited 
by the availability of sufficiently accurate and reliable experimental data 
needed to evaluate the performance of such theoretical procedures. 

Indeed, the term “chemical accuracy” started gaining broad use in com- 
putational thermochemistry in the early 1990s alongside the development of 
the first CCSD(T) or QCISD(T)-based composite ab initio methods (99). 
This terminology has been instrumental in setting a target accuracy for 
the early thermochemical methods. However, it should be stressed that 
chemical accuracy refers to a level of accuracy of 1 kcal mol ', but it 
does not specify how this level of accuracy should be quantified. For exam- 
ple, chemical accuracy may refer to mean-absolute deviation (MAD) < 
1 kcal molt, root-mean-square deviation (RMSD) <1 kcal mol ', or 
95% confidence interval (CI) <1 kcal mol~' from sufficiently accurate 
experimental or theoretical data. The lack of a clear definition as to how 
to quantify chemical accuracy diminishes the value of this terminology. 
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The quantification of uncertainty in electronic structure calculations 
has been discussed in detail by Ruscic (100), and we reiterate the recom- 
mendation that a 95% CI is the most robust measure of uncertainty in com- 
posite ab initio calculations. A 95% CI obtained for a given theoretical 
procedure, and thermochemical property relative to a sufficiently accurate 
and large/representative benchmark dataset means that a value calculated 
following the same procedure should lie within the quoted uncertainty 
19 times out of 20 for species outside the dataset. Similarly, Peterson, 
Feller, and Dixon (79), suggested that to achieve an experimentalist’s notion 
of chemical accuracy, an appropriate definition would be twice the MAD 
being below 1 kcal mol7'. 

It is also important to stress that the performance of any quantum 
chemical method depends on the chemical property being calculated and 
the size/diversity of the dataset being used for the performance evaluation. 
For example, a given quantum chemical procedure may achieve chemical 
accuracy for isodesmic bond-separation energies but not for total atomiza- 
tion energies (TAEs) (101,102); or achieve chemical accuracy for TAEs of 
species dominated by dynamical correlation but not TAEs of multireference 
species (80,103). 

The term chemical accuracy is widely used in various contexts to 
describe the level of accuracy of wavefunction-based methods and even 
in the context of density functional theory (DFT) calculations. Here we note 
that the following parameters should be specified in order for the term 
chemical accuracy to be more meaningful: 

e The statistical metric used for defining chemical accuracy (e.g., MAD, 
RMSD, or 95% CI) 

e The composition of the benchmark dataset used for the evaluation (e.g., 
in terms of the elemental composition or multireference character of the 
species involved) 

e The chemical property that is being considered (e.g., TAEs, conforma- 
tional energies, reaction barrier heights, or non-covalent interactions) 

We note that once the statistical metric used for defining chemical accuracy 

is specified, it can be converted to a different one using the following guide- 

lines: (80, 100) 


* MAD V2 x RMSD#0.8 x RMSD (for a normal error distribution 


with a small systematic error) 
e 95% C12 x RMSD 
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e 95% CI%2.5 x MAD (for a normal error distribution with a small 
systematic error, however, as noted by Ruscic (100), the conversion 
factor can reach up to 3.5 depending on the error distribution) 

In the context of thermochemical and kinetic properties, such as bond 

dissociation energies, heats of formation, and reaction barrier heights, it is 

well accepted that chemical accuracy refers to 1 kcal mol” '. However, 
chemical accuracy may refer to smaller energetic thresholds for less challeng- 
ing thermochemical properties. By less challenging properties, we mean 
properties that benefit from a larger degree of systematic error cancelation 
between reactants and products. For example, it has been noted by 

Mardirossian and Head-Gordon (104) and by Mehta et al. (105), that a value 

of 0.1 kcal mol is more appropriate in the context of nonbonded interac- 

tions. Such nonbonded properties may include hydrogen and halogen 
bonding, dispersion interactions, and conformational isomerizations that 

do not involve covalent bond breaking (106). 

High-level composite ab initio methods that include contributions 
beyond the CCSD(T) level can obtain thermochemical and kinetic data 
with confident sub-chemical accuracy. Thus, it is helpful to define another 
level of accuracy of 1.0 kJ mol” ' 0.239 kcal mol ', which is commonly 
referred to as “benchmark accuracy.” It has been found that post-CCSD(T) 
composite ab initio methods such as the Weizmann-4 (W4) theory are capable 
of obtaining TAEs with confident sub-kJ mol! accuracy (i.e., 95% confidence 
intervals <1 kJ mol _', and maximal errors below ~1 kJ mol! even for path- 
ologically multireference systems such as ozone, halogen oxides, and carbon 
clusters) (33,76). Notably, this level of accuracy surpasses that of many tradi- 
tional experimental thermochemical tabulations such as the NIST Chemistry 
WebBook (96) and Computational Chemistry Comparison and Benchmark 
DataBase (CCCBDB) (107). However, the Active Thermochemical Tables, 
in general, have substantially higher accuracy. Having said that, it should 
also be pointed out that post-CCSD(T) composite ab initio methods 
such as W4 theory are only applicable to relatively small molecules with up 
to ~8 nonhydrogen atoms (e.g., CCl, SiFy, C6H6, SFe, and CCl.) 
(76,80, 103, 108). 

Composite ab initio methods are also used for the calculation of spectro- 
scopic properties based on energy derivatives with respect to the nuclear 
coordinates, such as equilibrium bond distances (r,), harmonic vibrational 
frequencies (@,), and first-order anharmonic corrections (@,¥,). Table 1 
gives an overview of common definitions for chemical and benchmark 
accuracies for thermochemical and spectroscopic properties. 
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Table 1 Common (and suggested) definitions for “chemical” and “benchmark” 
accuracies for different chemical properties. 
Property Chemical accuracy Benchmark accuracy 


Heats of 1.0 kcal mol~! =4.2 kJ mol! 0.24 kcal mol~'=1.0 kJ mol! 
formation" 


Weak 0.1 kcal mol™*=0.42 kJ mol™! 0.024 kcal mol™'=0.1 kJ mol”! 
interactions 

Bond 0.005 Å 0.001 Å 

distances 

Vibrational 5.0 cm! 1.0 cm! 

frequencies 


“Or other chemical properties involving multiple breaking/forming of bonds. 


1.2 Limitations of single-point energy ab initio calculations 


Table 2 gives the formal computational costs of relevant quantum chemical 
methods. With current mainstream computer technology, it is possible to 
run Hartree-Fock (HF) energy calculation for systems with hundreds of 
non-hydrogen atoms. This also applies to hybrid density functional 
theory, which involves both exact HF and DFT exchange. Hybrid-DFT 
has the same computational scaling as HF theory but is far more useful 
for describing most chemical properties (104). Such calculations are nor- 
mally carried out in a single step, i.e., via a single-point energy (SPE) 
calculation. A fundamental limitation of high-level ab initio methods is 
the exponential increase in computational cost with system size (Table 2). 
For example, the CCSD(T) method with formal scaling of Npa 
(compared to ~ Npa” for hybrid DFT) is applicable to systems with dozens 
of non-hydrogen atoms. Likewise, post-CCSD(T) SPE calculations 
are generally limited to systems with only a handful of non-hydrogen 
atoms. Furthermore, due to the exceedingly slow basis set convergence of 
coupled-cluster methods, very large Gaussian basis sets must be employed 
in such calculations in order to obtain thermochemical data with chemical 
or benchmark accuracy. Thus, even for very small molecules, it is imprac- 
tical to approach the full configuration interaction (FCI) complete basis-set 
limit (CBS) via a single-point energy calculation. 

To illustrate the limitations of the SPE approach for approximating the 
exact solution to the nonrelativistic Schrödinger equation, let us consider the 
bond dissociation energy of the CE ) diatomic. Hereinafter, the regular 
and augmented correlation-consistent basis sets are denoted by VnZ and 
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Table 2 Overview of formal computational scaling of coupled-cluster 


methods. 

Definition Computational scaling? Overall? 
HF and hybrid DFT Noe Noire Niter Npa” 
MP2 and DHDFT” Noe Noe Npa 
CCSD Npe Noin Niter Npa! 
CCSD(T) ~Noce Nrin Niter+ Noce Nyi ~Npas. 
CCSDT Noo Noire Niter Npa 
CCSDT(Q) ~Noce Nyie Niert Noa Nore Noas 
CCSDTQ iN, Nyi Nice ~Npas 
CCSDTQ(5) ~Noce Noin Niert Noc Noir ~Npas 
CCSDTQ5 tN Nyiri Niter Npa” 
CCSDTQ5(6) ~Noce Noir Niser+ Noa Noire Npa” 
CCSDTQ56 No Nea Niter ~Nyas t 


“Where N,« is the number of occupied orbitals, N,;„ is the number of unoccupied orbitals, 
Niter is the number of iterations required to reach convergence, and Npas is the number of 
basis functions. 

PDHDFT = double-hybrid density functional theory. 


AVnZ, respectively (where often AVnZ also indicates the omission of diffuse 
functions from hydrogen atoms), and the notation V{X,Y}Z indicates 
extrapolation from the VXZ and VYZ basis sets. Calculating the C2 BDE 
via a single-point energy CCSDTQ/VTZ calculation results in a BDE 
that is 6.08 kcal mol! (!!) below the CCSDTQ/CBS BDE.'°” Increasing 
the basis set size in the CCSDTQ calculation still results in unacceptably 
large deviations; namely, the CCSDTQ/VnZ BDE underestimates the 
CCSDTQ/CBS BDE by 2.26 (VQZ), 1.11 (V5Z), and 0.65 (V6Z) 
kcal mol~'. Thus, even the computationally demanding CCSDTQ/V6Z 
calculation—which required several weeks to run on a computer node with 
512 GB of RAM—is unable to achieve benchmark accuracy. Needless to 
say, the CCSDTQ/V6Z level of theory is only feasible for light diatomic 
systems. However, even the CCSDTQ/VTZ level of theory is not practical 
for molecules with more than five non-hydrogen atoms using current main- 
stream computer hardware. Therefore, regardless of one’s computational 
resources, SPE calculations are not an effective way of approaching the 
FCI/CBS limit. 
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Fig. 1A illustrates the way in which single-point energy calculations 
approach the exact solution to the nonrelativistic Schrodinger equation. 
In this approach, both the one-particle and n-particle spaces are saturated 
at the same time. The one-particle space describes the size of the basis 
set used to express the orbitals in the Hartree-Fock wavefunction, and 
the n-particle space describes the level of excitation included in the cluster 
operator. Attempting to converge both spaces to completeness simulta- 
neously will indeed approach the exact solution. However, due to the expo- 
nential computational scaling of coupled-cluster methods (Table 2), this 
approach is impractical even for very small molecules. For example, the 
Hartree—Fock calculation scales as Npa“, the CCSD method scales as itera- 
tive N,,.°, and the CCSDTQ method scales as iterative Na ° With respect to 
the number of basis functions. 

To illustrate how composite ab initio methods work, let us now consider 
the following reorganization of the CCSDTQ/VnZ energy: 


CCSDTQ/VnZ = ae + [CCSD — HF]/VnZ 
+ (CCSD(T) — CCSD]/VnZ. 
+ [CCSDT — CCSD(T)]/VnZ 
+ [CCSDT(Q) — CCSDT]/VnZ 
+ [CCSDTQ — CCSDT(Q)]/VnZ (1) 


Eq. (1) is a simple breakdown of the CCSDTQ/VnZ energy into the SCF 
and coupled-cluster correlation terms. However, it illustrates that using a 
medium-sized basis set (e.g., VTZ) will result in large basis-set truncation 


(a) Single-point energy calculations (b) Composite ab initio methods 
FCI AFCI 
® A ® 5 
g i 8 è 
2 CCSDTQ & AQ-(Q) 
œ CCSDT(Q) K3 A(Q) 
m CCSDT £ AT-(T) 
@ CCSD(T) G A(T) 
2 2 
È CCSD é ACCSD 
HF HF 
DZ -OZ -02c E A, AN e r A o Y A 
one-particle space one-particle space 


Fig. 1 Modified Pople diagrams illustrating the different relationships between the 
one- and n-particle spaces in (A) single-point energy calculations and (B) composite 
ab initio methods, in which successively higher cluster expansion terms 
(ACCSD — A(T) + AT-(T) > A (Q) — AQ-(Q) — etc.) converge increasingly faster with 
the basis set size. 
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errors for the HF and lower-order correlation components. Hence the devi- 
ation of 6.08 kcal mol is obtained for the C2('Z;) example above. On the 
other hand, using a large basis set (e.g., V6Z) will make the higher-order 
correlation components impractical and, as shown above for CC 
still results in a sizable basis set truncation error of 0.65 kcal mol '. The 
same arguments apply if we replace the CCSDTQ/VnZ energy in 
Eq. (1) with the FCI/VnZ energy. 

The underlying premise behind composite ab initio methods is that 
successively higher cluster expansion terms tend to converge increasingly 
faster with the basis set since they increasingly reflect nondynamical rather 
than dynamical correlation (76). Therefore, the computationally more 
demanding higher-level terms in Eq. (1) can be calculated with smaller 
and smaller basis sets. This accelerated convergence behavior is the main 
reason why post-CCSD(T) composite ab initio methods are applicable 
to much larger systems than high-level SPE calculations (e.g., CCSDTQ/ 
V6Z). This idea is illustrated in Fig. 2B and will be discussed in detail in 
Section 2.2. 


Exact electronic energy 


sub-chemical (CH) 29 
3 


chemical or 
approaching chemical Coo 


Fig. 2 Jacob’s Ladder of composite ab initio methods. Each consecutive rung represents 
a more rigorous treatment of the one- and/or n-particle space, including examples of 
system sizes to which procedures from each rung can be applied using current 
mainstream computer hardware. 


Chemical Accuracy 
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1.3 The “zoo” of composite ab initio methods 


Over the past three decades, there has been a proliferation in the number of 
developed composite ab initio methods, including the Gn, CBS, FPA, Wn, 
MCCM, HEAT, ccCA, FPD, ATOMIC, INT-MP2-F12, and ChS family 
of methods, where some of these families include a dozen (or more) of 
different variants with different applicability and computational cost. 
Overall, there are over a hundred different composite ab initio methods 
to choose from, and it has become increasingly difficult to choose the best 
method for a given chemical system, property, and desired accuracy. When 
choosing a composite ab initio method, one must consider several key 
aspects, such as system size, elemental composition, multireference charac- 
ter, electronic state, the chemical property of interest, and the desired accu- 
racy. Additional aspects that may impose additional requirements on the 
basis sets employed are bond polarity, formal oxidation state, and overall 
charge. The various composite ab initio methods cover a wide range of 
accuracies (from chemical to benchmark accuracy), applicability in terms 
of system size (from five to over 50 non-hydrogen atoms), applicability 
in terms of elemental composition (from methods applicable only to 
first-row systems to pan-periodic table methods), and different chemical 
properties (from atomization energies to properties that depend on energy 
derivatives such as equilibrium structures and vibrational frequencies). In 
addition, it should be mentioned that a few composite ab initio methods 
have been specifically designed for treating excited electronic states and 
challenging PESs (109-111). 
In the context of the present review, it is convenient to classify the 
composite ab initio methods into four categories: 
e Methods that combine second-order Moller—Plesset perturbation theory 
(MP2) and CCSD(T) calculations 
e Methods that rely purely on coupled-cluster calculations up to CCSD(T) 
e Methods that rely purely on coupled-cluster calculations up to 
CCSDT(Q) 
e Methods that rely purely on coupled-cluster calculations up to 
CCSDTQ5 (or higher) 
Methods that involve both MP2 and CCSD(T) calculations (hereinafter 
referred to as hybrid CCSD(T)/MP2 methods) use relatively large basis sets 
in the MP2 steps and smaller basis sets in the CCSD(T) and higher-order 
MPn calculations. Purely CCSD(T)-based methods use larger basis sets 
in the CCSD(T) calculations compared to the hybrid CCSD(T)/MP2 
methods. Post-CCSD(T) composite approaches may use even larger basis 
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sets in the CCSD(T) calculations and additionally employ contributions up 
to the CCSDT(Q) or CCSDTQS5 level. Both the accuracy and computa- 
tional cost of composite ab initio methods increase in the order: 


hybrid CCSD(T) /MP2 — pure CCSD(T) — pure CCSDT(Q) 
— pure CCSDTQ5 


Fig. 2 depicts a proposed Jacob’s Ladder of composite ab initio methods. 
In this framework, each consecutive rung represents a more rigorous treat- 
ment of either the one- or n-particle space, along with an increase in the 
computational cost. The move from rung 1 (hybrid CCSD(T)/MP2 
methods) to rung 2 (pure CCSD(T) methods) represents a more rigorous 
treatment of the one-particle space. The move from rung 2 to rung 
3 (CCSDT(Q) methods) represents a more rigorous treatment of the 
n-particle space, which is often accompanied by a more rigorous treatment 
of the one-particle space at the CCSD(T) level. The move from rung 3 to 
rung 4 (CCSDTQ5 or higher methods) represents a more rigorous treat- 
ment of the n-particle space. The methods on the first two rungs are 
normally capable of chemical accuracy (or approaching chemical accuracy) 
for TAEs. The methods on the third rung are cable of approaching bench- 
mark accuracy for TAEs. The methods on the fourth rung are cable of 
sub-benchmark accuracy for TAEs. For example, the Weizmann-4 and 
HEAT-456QP methods, which are amongst the most accurate methods 
of the fourth rung, attain RMSDs of 0.072 and 0.100 kcal mol‘. (76) 
respectively, for a set of highly accurate TAEs obtained from the Active 
Thermochemical Tables thermochemical network (97,98). These 
RMSDs translate to 95% confidence intervals lower than 1 kJ mol | (76). 
The performance of the methods from the middle rung has been evaluated 
for a large dataset of TAEs relative to CCSDTQ5/CBS data obtained 
from W4 theory (or higher). For example, for a diverse set of 124 
non-multireference TAEs in the W4—-11 database, W1U and W1RO the- 
ories from the second rung attain RMSDs of 0.57 and 0.65 kcal mol |, 
respectively (80). For comparison, methods from the first rung such as 
G4(MP2) (112) and ccCA-PS3 (55), attain a MAD of 1.04 and kcal 
mol ' for the experimental energies of the G3/05 test set. We note that 
the 95% confidence intervals for the above methods from the first and 
second rungs of Jacob’s Ladder exceed 1 kcal mol '. Thus, in a more strict 
sense, these methods do not attain confident chemical accuracy for TAEs 
(see Section 4 for a more detailed discussion of performance). It is also 
important to keep in mind that, in a similar manner to Jacob’s Ladder of 
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DFT, Jacob’s Ladder of composite ab initio methods is only a general frame- 
work for an increase in accuracy and computational cost, and methods 
from higher rungs are not always guaranteed to be more accurate than 
methods from a lower rung for any given chemical system and property. 


2. Nonrelativistic electronic energy 
2.1 Hybrid CCSD(T)/MP2 composite ab initio methods 


Hybrid CCSD(T)/MP2 methods are highly successful and cost-effective 
composite ab initio methods that employ lower-level CCSD(T) calculations 
in conjunction with large-scale, second-order Moller—Plesset perturbation 
theory (MP2) calculations. Popular examples of such methods include the 
Gn, CBS, and ccCA family of methods. The underlying approximation 
in many of these methods is the folowing MP2-based additivity scheme: 


CCSD(T) /Large ~ CCSD(T) /Small + MP2/Large — MP2/Small 
+ [additional corrections] (2) 


Here “Small” and “Large” represent different basis set sizes or basis-set 
extrapolation schemes. This additivity scheme can reduce the computa- 
tional cost by an order of magnitude relative to the computational cost of 
pure CCSD(T)-based approaches since it replaces a CCSD/Large or 
CCSD(T)/Large calculation with an MP2/Large calculation (Table 2). 
The success of this scheme relies on the similar basis-set convergence behav- 
ior of the CCSD(T) and MP2 energies, which was found to be true for 
thermochemical (113—115) and kinetic (116,117) properties, as well as 
weak interactions (118—123). 

It is instructive to see what Eq. (2) looks like in two representative 
composite ab initio approaches, namely G4(MP2) (112) and ccCA-PS3 
(55). The computationally economical G4(MP2) method employs the 
Pople-style basis sets for all steps apart from the HF/CBS energy as well 
as an empirical higher-level correction (HLC) term. The ccCA-PS3 
method, on the other hand, employs the correlation-consistent basis sets 
in conjunction with basis set extrapolations and does not involve an empir- 
ical HLC term. G4(MP2) theory uses the following simple and elegant 
underlying expression for the nonrelativistic electronic energy: 


E[G4(MP2)] = E[CCSD(T)/6—31G(d)] 
+ E[MP2/G3MP2LargeXP] 
— E[MP2/6—31G(d)] + E[HF/CBS| 
— E{HE/G3MP2LargeXP] + E(HLC) (3) 
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Here, G3MP2LargeXP is an extended version of the Pople 6-311+G 
(3df,2p) basis set with additional polarization functions, and HF/CBS indi- 
cates extrapolation of the HF energy from truncated versions of the AVTZ 
and AVQZ basis sets. The E(HLC) term is an empirical “higher level 
correction” term that depends on the number of paired and unpaired elec- 
trons. The HLC empirical parameters are optimized to minimize to mean 
absolute deviation (MAD) from the experimental determinations in the 
G3/05 test set (124). Thus, the HLC term compensates for systematic defi- 
ciencies in the electronic and nuclear components (and may also include 
contributions from terms that are not explicitly included in the model, such 
as core-valence and scalar relativistic corrections). However, it is important 
to point out that the HLC term cancels out between reactants and products 
for chemical transformations involving only closed-shell species. Thus, 
G4(MP2) becomes nonempirical for the calculation of reaction energies 
of isogyric reactions—that is, reactions in which the number of spin pairs 
is conserved. The same is true for the calculation of reaction barrier heights 
in which the number of paired and unpaired electrons is conserved between 
the transition structure and reactant(s). The bottleneck step in G4(MP2) 
theory is typically the CCSD(T)/6-31G(d) calculation. Thus, G4(MP2) is 
one of the computationally most economical composite ab initio methods 
and can be applied to very large systems, most notably a series of Cg and 
C4o isomers (125—127). We also note that some variants of the G4(MP2) 
procedure (e.g., G4(MP2)-6X) (13) use the same energetic components 
as G4(MP2). Thus, the G4(MP2)-6X energy can be obtained from a 
G4(MP2) calculation at no additional computational cost. 

A different type of hybrid MP2/CCSD(T) composite ab initio method, 
which does not involve an HLC term, is the ccCA family of methods. For 
example, the underlying expression for the nonrelativistic electronic energy 
in the ccCA-PS3 method is: 


E[ccCA] = E[HF/CBS] + E[MP2°"/CBS] + E[CCSD(T)/VTZ] 
— E[MP2/VTZ]+ AE[CV] (4) 


Here, the HF and MP2 correlation energies are extrapolated from basis 
sets of up to AVQZ, and the AE[CV] correction is taken as E[MP2(AE)/ 
ACVTZ] — E[MP2(FC)/AVTZ]. The bottleneck step in ccCA-PS3 is 
typically the CCSD(T)/VTZ calculation. Thus, the ccCA-PS3 method 
has an intermediate computational cost between W1 theory (with a 
CCSD/AVQZ bottleneck step) and G4(MP2) (with a CCSD(T)/6-31G 
(d) bottleneck step). 
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2.2 Pure coupled-cluster-based composite ab initio methods 


Composite ab initio methods based purely on coupled-cluster methods 
can be divided into two general categories CCSD(T)/CBS methods and 
post-CCSD(T)/CBS methods. The computationally more economical 
CCSD(T)/CBS methods can achieve confident chemical accuracy for 
non-multireference systems and occupy the second rung of our proposed 
Jacob’s Ladder of composite ab initio methods (Fig. 2). These methods 
are applicable to medium-sized systems such as corannulene (C2 9H19) 
(128), sumanene (C21H12) (128), dodecahedrane (C20oH20) (129), and 
carbon clusters (Cog and C24) (130). The post-CCSD(T) methods can 
be divided into CCSDT(Q)/CBS and CCSDTQ5/CBS_ methods. 
CCSDT(Q)/CBS methods can normally approach benchmark accuracy 
and are applicable to systems such as benzene (36,131), and 
CCSDTQ5/CBS methods can achieve sub-benchmark accuracy and are 
applicable to smaller systems such as butane and tetrahedrane (103,106). 

In purely coupled-cluster-based composite ab initio methods, the 
coupled-cluster energy is partitioned into the SCF and coupled-cluster 
correlation components. The notations that are commonly used for 
the coupled-cluster correlation components are listed in Table 3. The 
coupled-cluster energies that correspond to rungs 2, 3, and 4 of Jacob’s 
Ladder are given by the following equations: 


Rung 2 : CCSD(T) /CBS 
~ HE/CBSyr + ACCSD/CBSaccsp + A(T)/CBSacr) (5) 
Rung 3 : CCSDT(Q)/CBS 
~ HF/CBSyr + ACCSD/CBSaccsp + A(T)/CBSa¢r) 
+ AT- (T)/CBSat-(1) + A(Q)/CBSa(Q) (6) 
Rung 4 : CCSDTQ5/CBS 
~ HF/CBSyp + ACCSD/CBSaccsp + A(T)/CBS ar) 
+AT— (T)/CBSat—(1) + A(Q)/CBSa(Q) +AQ 
— (Q)/CBSagQ—(Qy + A(5)/CBSais) + A5 
— (5)/CBS45-(5) (7) 


where CBS.omp designates a single basis set or basis set extrapolation 
scheme used for calculating each energetic component depending on the 
basis set convergence of each component (comp=HF, ACCSD, A(T), 
AT-(T), etc.). The expansion in Eq. (7) approximates the FCI/CBS energy 
and can include terms above A5—(5)/CBS,5_(5) depending on the magni- 
tude of the A5—(5) term. This partitioning of the SCF and correlation energy 
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Table 3 Overview of the coupled-cluster contributions discussed in the present work. 


Name Definition Abbreviation 
Hartree-Fock HF energy HF or SCF 
Full-iterative connected doubles CCSD-HF ACCSD 
Noniterative connected triples CCSD(T)—CCSD A(T) 
Full-iterative connected triples CCSDT-CCSD(T) AT-(T) 
Noniterative connected quadruples CCSDT(Q)—CCSDT A(Q) 
Full-iterative connected quadruples CCSDTQ-CCSDT(Q) AQ-(Q) 
Connected quadruples as a whole ©CCSDTQ-CCSDT AQ 
Noniterative quintuples CCSDTQ(5)—CCSDTQ A(5) 
Full-iterative connected quintuples CCSDTQ5-CCSDTQ(5) A5-(5) 
Connected quintuples as a whole CCSDTQ5-CCSDTQ A5 
Noniterative sextuples CCSDTQ5(6)-CCSDTQ5 A(6) 


Full-iterative connected sextuples CCSDTQ56-CCSDTQ5(6) A6—(6) 


Connected sextuples as a whole CCSDTQ56—CCSDTQ5 A6 


Post-CCSD(T) as a whole CCn-CCSD(T)* Post- 
CCSD(T) 


*“CCn=any post-CCSDT method, e.g., CCSDT(Q), CCSDTQ, CCSDTQ(5), ete. 


components is a highly efficient and effective approach for obtaining the 

CCSD(T)/CBS, CCSDT(Q)/CBS, or FCI/CBS energies. 

A few general design features that are important to the success of 
coupled-cluster-based composite ab initio methods are: 

e All the energetic components (HF, ACCSD, A(T), AT-(T), A(Q), etc.) 
are converged separately to a common accuracy level, i.e., the basis set 
truncation error associated with each of the terms should be roughly 
the same: €(CBSyp) €(CBSaccsp) © €(CBS arr) &€(CBSar_r) YE 
(CBSa(Q) ~: 

e Smaller basis sets are needed for calculating higher-order correlation 
components since they increasingly reflect nondynamical rather than 
dynamical correlation. 

¢ Each of the energetic components (HF, ACCSD, A(T), AT-(T), A(Q), 
etc.) may exhibit a different basis set convergence behavior and is 
converged to the CBS limit in the most effective manner (e.g., using 
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an optimal extrapolation exponent or a scaling factor). The effective 
extrapolation exponents or scaling factors can be physically or empiri- 
cally motivated. 

e The HF and lower-level correlation components (e.g., ACCSD, A(T), 
and AT-(T)) are typically extrapolated to the CBS limit, whilst the 
higher-order correlation components (e.g., A(Q) and above) are typi- 
cally calculated with a single basis set of double-€ or triple-¢ quality. 

e In many contemporary composite ab initio methods, the basis set con- 
vergence of the HF, ACCSD, and A(T) components is accelerated by 
explicitly correlated, density fitting, and local coupled-cluster techniques 
keeping computational cost at a minimum. 

The above factors contribute to the computational efficiency and accuracy 

of contemporary composite ab initio methods. Of particular importance to 

the success of these methods is that contributions from successively higher 
cluster expansion terms (i.e., ACCSD — A(T) — AT-(T) > A(Q) — AQ 

(Q) — A(5) — etc.) tend to converge increasingly faster to the complete basis 


set limit, since they increasingly reflect static rather than dynamical correla- 
tion. This relationship between the n- and one-particle spaces is illustrated 
schematically in Fig. 1B and is the main reason that post-CCSD(T) com- 
posite ab initio methods are applicable to molecules with nearly ten 
non-hydrogen atoms at a realistic computational cost (e.g., C2Clę, SFo, 
and C6H6) (76,80, 103, 108). 

Table 4 summarizes the basis sets used for extrapolating or calculating 
the various components in two representative coupled-cluster-based com- 
posite ab initio methods Wn (33,34,36,76,80,87,90) and HEAT (46—50). 
As can be seen, the basis sets used in the Wn and HEAT methods vary in a 
systematic manner across the rungs of Jacob’s Ladder (rows in Table 4) and 
correlation components (columns in Table 4). For example, in the original 
Wn methods, the HF and ACCSD components are extrapolated from 
the following basis sets AV{T,Q}Z (W1 theory, rung 2), AV{Q,5}Z 
(W3 theory, rung 3), and AV{5,6}Z (W4 theory, rung 4). As discussed 
above, smaller basis sets are used for calculating successively higher-order 
correlation components. For example, in W4 theory, the following basis sets 
(or basis set extrapolations) are used AV{5,6}Z (ACCSD), AV{Q,5}Z 
(A(T), ViD,T}Z (AT-(T)), VTZ (A(Q)), VDZ (AQ~(Q)), and VDZ(sp) 
(A5) (where VDZ(sp) is a truncated version of the VDZ basis set) (33). 
We note that the largest basis set used for extrapolating the HF and 
ACCSD components is the same in all cases. From a computational cost 
perspective, it would make little sense to extrapolate the HF component 


Table 4 Overview of the basis sets used for extrapolating or calculating the HF and correlation components in representative variants of the 
Wn and HEAT composite ab initio methods. 


Name CCn/CBS° HF ACCSD A(T) AT-(T)  A(Q) AQ-(Q) A(5) or A5 A(6) 
Wi CCSD(T) AV{T,Q}Z  AV{T,Q}Z AV{D,T}Z 

w2 CCSD(T) AV{Q,5}Z AV{Q,5}Z  AV{T,Q}Z 

W3 CCSDT(Q) AV{Q,5}Z AV{Q,5}Z  AV{T,Q}Z  V{D,T}Z VDZ 

W4lite CCSDT(Q) AV{5,6}Z AV{5,6}Z AV{Q,5}Z  V{D,T}Z VDZ 

W4 CCSDTQ5 AV{5,6}Z AV{5,6}Z AV{Q,5}Z  V{D,T}Z VTZ VDZ VDZ(sp) 
W4.3 CCSDTQ56 AV{5,6}Z AV{5,6}Z AV{Q5}Z = V{T,Q}Z V{T,Q}Z VTZ VDZ VDZ(sp) 
W1-F12 CCSD(T) V{D,T}Z-F12 V{D,T}Z-F12 AV{D,T}Z 

W2-F12 CCSD(T) V{T,Q}Z-F12 V{T,Q}Z-F12 VTZ-F12 

W3-F12 CCSDT(Q) V{T,Q}Z-F12 V{T,Q}Z-F12 VTZ-F12 V{D,T}Z VDZ 

W4-F12 CCSDTQ5 V5Z-F12 V{Q,5}Z-F12 AV{Q5}Z  V{D,T}Z VTZ VDZ VDZ(sp) 
HEAT-345(Q) CCSDT(Q) ACV{T,Q,5}Z ACV{Q,5}Z ACV{Q,5}Z V{T,Q}Z VDZ 

HEAT-456QP CCSDTQ5 ACV{Q,5,6}Z ACV{5,6}Z ACV{5,6}Z V{T,Q}Z VDZ VDZ VDZ 
diet-HEAT CCSDT(Q) AV{T,Q,5}Z AV{Q,5}Z AV{Q,5}Z VTZ VDZ 

diet-HEAT-F12 CCSDT(Q) VQZ-F12 V{T,Q}Z-F12 V{T,Q}Z-F12 VTZ(spd) VDZ 


“CCn=Coupled-cluster excitation level being approximated. 
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from smaller basis sets than the ACCSD component, however in some cases 
(e.g., W4-F12 and diet-HEAT-F12) there is no need to extrapolate the HF 
component, and it is simply calculated using the larger basis set. 

A notable difference between the Wn and HEAT theories is that in the 
Wn methods the A(T) and the ACCSD correlation components are extrap- 
olated separately to the complete basis set limit, whereas in the HEAT 
methods the ACCSD(T) component is extrapolated to the CBS limit as a 
whole. In the Wn methods, the A(T) component is typically extrapolated 
from basis sets with one cardinal number smaller than those used for extrap- 
olating the ACCSD component. This separation makes the CCSD(T)/CBS 
Wn methods computationally more economical. For example, W1 and 
W1-F12 theories have been applied to systems as large as arginine 
(C6H14N402) (132), terphenyl (CygHy4) (133), corannulene (C2 9H 0) 
(128), dodecahedrane (C2 9H29) (129), and sumanene (C2;Hj2) (125). 
Following the same trend of the HF, ACCSD, and A(T) components, the 
largest basis set used for extrapolating or calculating the AT-(T) component 
is smaller by 1—2 cardinal numbers than the largest basis set used for extrap- 
olating the A(T) component. In all the CCSDT(Q)/CBS methods in 
Table 4, the A(Q) component is calculated with the VDZ basis set. In the 
CCSDTQ5/CBS methods in Table 4, the AQ—(Q) component is calculated 
with the VDZ basis set (except for W4.3 theory). We note that the principal 
bottleneck in applying post-CCSD(T)/CBS methods to large systems is 
typically the evaluation of the AT-(T) term in CCSDT(Q)/CBS methods 
and the AQ-(Q) term in CCSDTQ5/CBS methods. 

Another important distinction between the Wn and HEAT methods 
is the treatment of the core electrons in the CCSD(T) calculations. In 
the HEAT methods, all electrons are correlated in the CCSD(T) calcula- 
tions. This means that all-electron CCSD(T)/ACV5Z (HEAT-345(Q)) 
and CCSD(T)/ACV6Z (HEAT-456QP) calculations are performed. 
Importantly, this eliminates the error associated with the CCSD(T) -based 
core-valence correction; however, it significantly increases the computa- 
tional cost of the CCSD(T) calculations, in particular for molecules con- 
taining second-row atoms. In the Wn methods, on the other hand, the 
ACCSD and A(T) terms are obtained within the frozen-core approxima- 
tion. That is, the inner-shell orbitals (1s for first-row atoms, and 1s, 2s, 
and 2p for second-row atoms) are constrained to be doubly occupied in 
all configurations. In practice, this partitioning between the inner- and 
valence-shell electrons makes Wn theories applicable to molecules with 
multiple second-row atoms at a realistic computational cost. For example, 
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WAlite theory has been applied to molecules such as Sg (134), and W1—-F12 
theory has been applied to systems as large as P4S1o (135). To account for 
inner-shell correlation, the Wn theories include a core-valence (CV) 
correction term. The CV correction is obtained at the CCSD(T)/ 
APWCV{T,Q}Z level in W2, W3, and W4 theories. It has been found that 
this level of theory provides an excellent balance between accuracy and 
computational cost with a root-mean-square deviation (RMSD) of merely 
0.03 kcal mol”! for the diverse set of 200 small first- and second-row 
molecules in the W4-17 database (103,136). In high-level theories such 
as W4, this RMSD is comparable to the errors in the valence parts and to 
post-CCSD(T) contributions to the CV component. For example, the 
AT-(T) correction to the CV component increases the atomization energies 
of HO; and O; by 0.02 and 0.03 kcal mol |, respectively (33). 


2.3 Correlation contributions beyond the CCSD(T) level 


Post-CCSD(T) correlation contributions are of key importance for achiev- 
ing chemical accuracy for multireference systems and for achieving 
sub-benchmark accuracy for non-multireference systems. It is therefore 
useful to gain an understanding of the magnitude of the post-CCSD(T) 
correlation contributions relative to the CCSD(T) energy. For this purpose, 
we consider the W417 database of highly accurate TAEs of 200 organic 
and inorganic species (36). Most of the TAEs in the W4—17 database have 
been obtained at the CCSDTQ5/CBS level of theory from W4 theory. For 
a set of 46 small molecules (e.g., acetylene, CH4, and ClO), the TAEs have 
been obtained at the CCSDTQ56/CBS level of theory from W4.3 and 
W4.4 theories. Whereas for a subset of 33 larger molecules (e.g., benzene, 
SF, and C2Cl,) the TAEs have been obtained at the CCSDT(Q)/CBS level 
of theory from W4lite theory. Overall, the W4-17 dataset includes mole- 
cules with up to eight non-hydrogen atoms, which cover a broad spectrum 
of bonding situations, electronic states, and multireference character. 
Table 5 gives an overview of the spread of the electronic SCF, ACCSD, 
A(T), AT-(T), AQ, A5, and A6, contributions for the W4-17 database in 
terms of the mean, standard deviation, and min/max values. The largest 
contribution to the TAEs is normally obtained at the SCF level. The mean 
SCF contribution to the TAEs is 270.6 kcal mol! with a standard deviation 
of 235.5 kcal mol '. However, for the larger molecules, the SCF contri- 
bution exceeds well over 1000 kcal mol~!, with the largest contribution 
of 1239.8 kcal mol’ for pentane. It is interesting to note that highly 
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Table 5 Overview of the magnitude (in kcal mol ') and signs of the 
electronic energetic contributions from W4.x theory for the set of 200 
TAEs in the W4—17 database. 


Component Mean SD* Largest 

SCF 270.63 235.46 1239.8 
(CsH12) 

ACCSD 123.73 62.12 334.4 (C5:H12) 

A(T) 11.92 6.99 42.8 (N2014) 

AT-(T) —0.85 0.68 —3.3 (P4) 

AQ 0.97 0.77 4.8 (S4) 

A5 0.06 0.07 0.4 (O3) 

A6 0.01 0.01 0.1 (C2) 


“Mean = mean value; SD = standard deviation; Largest = largest positive or negative 
value. 


multireference systems such as BN, O3, FO, Fs, F2O, FO2, F2O2, ClO», 
C1O;, CIF3, and CIF; are unbound at the SCF level, i.e., the SCF con- 
tribution to the TAE is negative. Single and double excitations from the 
Hartree-Fock configuration constitute the largest contribution to the 
correlation energy. The mean ACCSD contribution for the W4-17 set 
is 123.7 kcal mol”! with a standard deviation of 62.1 kcal mol™*. The 
A(T) contribution to the TAEs is still large, with a mean and standard 
deviation of 11.9 and 7.0 kcal mol _', respectively. For over 50% of the spe- 
cies in the W4—17 database, the A(T) contribution exceeds 10.0 kcal mol’, 
and the maximum A(T) contribution reaches 42.8 kcal mol! for N204. 
These statistical measures illustrate why all the composite ab initio methods 
on Jacob’s Ladder (Fig. 2) must explicitly include the A(T) contribution. 
It is well established that the higher-order triples contributions (AT—(T)) 
tend to reduce the TAEs. Indeed, 94% of the AT-(T) contributions are 
negative and the positive contributions are close to zero, i.e., they are 
smaller than +0.3kcalmol~'. The mean and standard deviation of 
the AT-(T) contribution are —0.85 and 0.68 kcal mol ', respectively. 
However, this contribution can reach a maximum negative value of 
—3.3 kcal mol! for Py. The quadruple excitations, on the other hand, 
universally increase the TAEs (i.e., all the AQ contributions are positive). 
The mean value of the AQ contributions is 0.97 kcal mol~! with a standard 
deviation of 0.77 kcal mol’. Thus, the positive AQ contributions have a 
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similar magnitude to the (mostly) negative AT-(T) contributions. For this 
reason, the complete neglect of contributions beyond the CCSD(T) level 
is one of the most successful yet computationally economical approaches 
in quantum chemistry, and the CCSD(T) method is nightly referred to as 
the gold standard of quantum chemistry. Nevertheless, as we will see in 
the following paragraph, this gold standard is only applicable in the accuracy 


range of +1 kcal mol '. In the context of sub-kcal/mol accuracies, the 
CCSDT(Q) (or CCSDTQ) method is more appropriately referred to as 
the gold standard. 

Inspection of the A5 contributions to the TAEs in the W4—17 database 
reveals that these contributions are typically smaller than 0.1 kcal mol”! 
for all but highly multireference systems. For multireference systems, how- 
ever, the A5 contributions are still significant at the sub-kcal/mol level. For 
example, they range between 0.2 and 0.4 kcal mol ' for systems like SO3, 
ClO3, FO2, NCCN, FO», S4, Co, and O3. The A6 contributions to the 
TAEs are very small and certainly negligible for all but highly multirefer- 
ence systems. The largest A6 contributions are 0.04 and 0.06 kcal mol! 
for BN and Co, respectively (137,138). The A7 contributions to the 
TAEs are practically nil being 0.003 kcal mol! for C2 (137). 

The W4-17 database includes 200 CCSDT(Q), CCSDTQ5, and 
CCSDTQ56 TAEs for molecules with up to eight non-hydrogen atoms, 
which cover a broad spectrum of bonding situations, electronic states, 
and multireference character. As such it is an excellent resource for quanti- 
tative evaluation of the accuracy that can be expected from CCSD(T) -based 
methods on the first two rungs of Jacob’s Ladder. Fig. 3 gives an overview 
of the post-CCSD(T) contributions to the 200 TAEs in the W4-17 
database. Post-CCSD(T) contributions to the TAEs tend to be evenly 
distributed between positive and negative values, albeit there are slightly 
more positive than negative values. Overall, 58% of the post-CCSD(T) 
contributions to the TAEs are positive and 42% are negative. 
Importantly, the highly multireference systems all have large and positive 
post-CCSD(T) contributions ranging between 1.0 and 3.5 kcal mol’. 
For these systems, the positive AQ contributions are significantly larger than 
the negative AT-(T) contributions. In particular, for eight systems, the 
overall post-CCSD(T) contributions range between 1 and 2 kcal mol, 
namely 1.1 (CIF5, NO3), 1.2 (S3), 1.3 (N2034), 1.4 (B2, CINO), and 1.8 
(F,O>, cis-HO3) kcal mol~'. For five systems, the post-CCSD(T) contri- 
butions to the TAEs range between 2.0 and 3.5 kcal mol” ', namely 2.3 
(trans-HO3), 2.4 (S4), 2.9 (O3), 3.0 (FO>), and 3.5 (CIO3) kcal mol” '. 
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Fig. 3 Overview of post-CCSD(T) contributions to the 200 total atomization energies in 
the W4—17 database (in kcal mol™'). 


The post-CCSD(T) contributions for the lion’s share of the TAEs (93%) 
are confined between +1.0 kcal mol`". Furthermore, for 76% of the TAEs, 
the post-CCSD(T) contributions are confined between +0.5 kcal mol™'. 
Thus, it is fair to say that the CCSD(T)/CBS level of theory is indeed, 
on average, a “gold standard” for systems dominated by mild-to-moderate 
multireference effects. Let us examine more closely the systems with 
relatively large post-CCSD(T) contributions ranging between —0.5 and 
—1.0 and between +0.5 and +1.0 kcal mol”. This set consists of 17% of 
the systems in the W4-17 database. The subset with positive post- 
CCSD(T) contributions ranging between +0.5 and +1.0 kcal mol”! 
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includes molecules that are characterized by moderate-to-severe 
multireference effects (e.g., BN, CN, NCCN, O2, HO», Fo, N2O, F20, 
P2, ClO, ClsOz2, and C103). On the other hand, the subset of molecules 
with negative post-CCSD(T) contributions ranging between —0.5 and 
—1.0 kcal mol! includes mostly hydrocarbons that are not normally asso- 
ciated with strong multireference effects (e.g., cyclobutene, n-pentane, 
furan, 1,3-dithiotane, tetrahedrane, cyclopentadiene, pyrrole, and thiophene). 
We note that tetrachloroethylene and benzene both have large negative 
post-CCSD(T) contributions of —0.99 kcal mol! and that hexachloroeth- 
ane has a post-CCSD(T) contribution of —1.7 kcal mol”'. It has been 
previously noted that medium-sized hydrocarbons are associated with 
post-CCSD(T) contributions that can exceed halfa kcal mol! even though 
they are clearly dominated by a single reference configuration (84, 139). 
Examining the AT-(T) and AQ contributions for the above medium-sized 
hydrocarbons, we find that the negative higher-order triples (AT-(T)) con- 
tributions are larger than the positive connected quadruple contributions. 
This is in contrast to systems that are dominated by multireference 
effects, for which the positive AQ contribution outweighs the negative 
AT-(T) contribution. There are some indications that this effect correlates 
with the size of the system. For example, for the series of saturated n-alkanes 
post-CCSD(T) contributions increase linearly with the size of the system, 
namely they amount to —0.13 (ethane), —0.28 (propane), —0.40 (n-butane), 
—0.54 (n-pentane), and —0.65 (n-hexane) kcal mol! (139). We note that 
the squared correlation coefficient between the post-CCSD(T) contribu- 
tions and the number of carbons in these alkanes is R7= 0.9991. 

The above discussion illustrates that multireference effects are not the 
only factor affecting the magnitude of post-CCSD(T) contributions and 
that the sign of the overall post-CCSD(T) contribution to the TAEs may 
indicate whether multireference or size effects are dominant. Systems 
with relatively large positive post-CCSD(T) contributions (+0.5 to 
+1.0 kcal mol ') are indicative of multireference character, whereas systems 
with relatively large negative post-CCSD(T) contributions (—0.5 to 
—1.0 kcal mol” ') may be indicative of size effects. 


3. Secondary energetic corrections 
3.1 The 3D Pople diagram of composite ab initio methods 


Fig. 1 shows how the one-particle and n-particle spaces converge to the 
exact nonrelativistic electronic energy. However, energies and chemical 
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properties calculated on the nonrelativistic electronic potential energy 
surface are not directly comparable to those obtained from experiments. 
To reproduce accurate experimental energetic and spectroscopic properties, 
secondary energetic contributions must be considered. These contributions 
may include spin-orbit, scalar relativistic, zero-point vibrational energy, 
Born—Oppenheimer, thermal, and entropic corrections. In certain cases, 
additional corrections may be needed, for example, conformational correc- 
tions to the enthalpy for floppy molecules (132,140), or tunneling contri- 
butions for reaction barrier heights involving hydrogen transfer (or 
heavy-atom transfers at low temperatures) (141). Fig. 4 gives a complete 
overview of the components involved in composite ab initio methods. 
The front face of the three-dimensional Pople diagram (red arrows) repre- 
sents the two-dimensional convergence of the one-particle and n-particle 
spaces, whereas the third dimension (green arrow) represents any secondary 
energetic contributions that are needed for a meaningful comparison 
with experimentally observable energetic and spectroscopic quantities. In 
principle, any secondary energetic component that can reasonably affect 
the molecular binding energies at the target level of accuracy should be 
explicitly (or implicitly) included in the third dimension of the 
composite method. 

The nonrelativistic electronic energy typically accounts for about 95% 
of the relativistic, all-electron, ZPWVE-inclusive TAE. For a detailed 
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Fig. 4 A three-dimensional Pople diagram illustrating the components comprising 
composite ab initio methods. The red axes represent the two-dimensional convergence 
of the nonrelativistic electronic energy. The green axis represents any additional ener- 
getic contributions (e.g., scalar relativistic, spin-orbit, zero-point vibrational energy, and 
Born—Oppenheimer corrections) that are needed for a meaningful comparison with 
experimentally observable quantities. 
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discussion of the various secondary energetic contributions, see Refs. 
(62,67,76,81,86-89,92). Here, we will focus on the relative magnitudes 
of the various secondary energetic contributions in the highly diverse 
W4-17 database. This benchmark dataset includes 200 all-electron, relativ- 
istic, ZPVE-inclusive, and DBOC-inclusive TAEs obtained mostly at the 
CCSDTQ5/CBS level from W4 theory. The W4-17 dataset includes 
first- and second-row molecules with up to eight non-hydrogen atoms 
and covers a broad spectrum of bonding situations and electronic states 
(36). Table 6 gives an overview of the magnitude and spread of the second- 
ary energetic contributions for the entire W4—-17 database in terms of the 
mean, standard deviation, and maximum values. 


3.2 The core-valence correction 


We begin by noting that some lower-level composite ab initio methods 
(e.g., G4(MP2)) do not include a CV term and rely on fortuitous error 
cancellation (112), whilst in some high-level methods (e.g., HEAT), a 
CV term is not needed since the ACCSD and A(T) components are calcu- 
lated with all-electrons correlated. Yet, most composite ab initio methods 
(e.g., CBS-APNO, G4, ccCA, Wn, Wn-F12, and FPD) obtain the 
CCSD(T)/CBS energy with only the valence electrons correlated (CCSD 
(T)a) and include a core-valence correction ACV to approximate the 
all-electron CCSD(T) energy (CCSD (T),y): 


CCSD(T),,/CBS ~ CCSD(T),,,/CBS + ACV (8) 


Table 6 Overview of the magnitude (in kcal mol!) and signs of the 
electronic and secondary energetic contributions from W4 theory for 
a set of 200 TAEs of first- and second-row molecules in the W4—17 


database. 

Component Mean sD? Largest 
Core-valence 1.56 1.48 7.4 (C6H6) 
Scalar relativ. —0.55 0.44 —3.2 (SF) 
Spin-orbit —0.74 0.78 —5.2 (C2Cl6) 
DBOC 0.06 0.07 0.3 (C5H12) 
ZPVE 17.61 16.97 99.5 (C5H12) 


“Mean = mean value; SD =standard deviation; Largest = largest positive or nega- 
tive value. 
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This approach significantly reduces the computational cost of the demand- 
ing CCSD(T)/CBS calculations and is essential for composite ab initio 
methods that are applicable to molecules containing several second-row 
(or heavier) elements. In most composite ab initio methods that belong 
to the first rung of Jacob’s Ladder (e.g., G3, G4, ccCA-PS3, and W1X-n) 
the ACV term is calculated at the MP2 level (ACV=MpP2,) — MP2,,.1). 
In methods that belong to the second rung, the ACV term is usually calcu- 
lated at the CCSD(T) level (ACV=CCSD(T), — CCSD(T),a1). For 
example, in the original W1 and W2 theories, the ACV term is calculated 
at the CCSD(T)/MTsmall level of theory, where MTsmall is a completely 
decontracted VTZ basis set with additional tight 2d1f functions (31). 
In methods that belong to the third and fourth rungs, the CV term is 
usually obtained at the CCSD(T)/CBS level of theory (ACV = CCSD 
(T)an/CBS — CCSD(T) ,.1/CBS). For example, in W4lite and W4 theories, 
the CV correction is extrapolated from the APWCV{T,Q}Z basis set pair 
(33). Finally, in some high-level methods on the fourth rung (e.g., W4.2 
and W4.3), the CV correction is obtained at the CCSDT level (33), 
and in W4.4, it is obtained at the CCSDT(Q) level (34). For a detailed 
discussion of the basis set and method dependencies of the core-valence 
(and core-core) contributions to TAEs see Ref. (139). 

The core-valence corrections in the W4—-17 database have been calcu- 
lated at the CCSD(T)/APWCV{T,Q}Z level of theory (with post- 
CCSD(T) contributions included for some of the smaller systems). 
Except for a small number of highly polar systems (e.g., CIFs, SFe, AlCls) 
for which the CV correction is repulsive, the CV term is attractive for nearly 
all species. The largest CV corrections in the W4—17 database are obtained 
for medium-sized hydrocarbon/heteroatom first-row molecules, namely 
4.4 (cyclobutadiene), 4.6 (cyclobutene and cyclobutene), 4.8 (n-butane 
and trans-butadiene), 5.1 (tetrahedrane and furan), 5.6 (pyrrole), 5.8 (borole), 
6.0 (n-pentane and cyclopentadiene), and 7.4 (benzene) kcal mol’. Thus, 
this term clearly cannot be neglected and has to be treated at a sufficiently 
high level of theory for quantitative chemical accuracy (36, 136, 142). 

It is instructive to examine how the CV correction varies with molecular 
size for a systematic series of hydrocarbons of increasing size. Fig. 5 gives an 
overview of the CV contributions for two such series (i) small straight-chain 
alkanes with up to five carbon atoms and (ii) (CH), hydrocarbon cages 
with up to 20 carbon atoms. The CV corrections for the straight-chain 
alkanes are taken from W4 theory (103), and those for the (CH),, hydrocar- 
bon cages are taken from W1—F12 theory (129). For the homologous series 
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Fig. 5 Overview of core-valence contributions to TAEs (in kcal mol) for a series of 
(A) linear alkanes (methane, ethane, propane, n-butane, n-pentane) from W4 theory 
and (B) platonic/prismatic (CH),, hydrocarbon cages: tetrahedrane (CH)4, triprismane 
(CH), cubane (CH)g, pentaprismane (CH)j0, octahedrane (CH),2, and dodecahedrane 
(CH)29 from W1-F12 theory. 


of straight-chain alkanes (methane, ethane, propane, n-butane, and 
n-pentane), there is a perfect linear correlation between the number of 
carbons in the alkane and the magnitude of the CV correction with a 
squared correlation coefficient of R?= 1.0000 (Fig. 5A). For methane, 
the CV correction already exceeds 1 kcal mol", i.e., it is 1.3 kcal mol! 
at the CCSD(T)/APWCV{T,Q}Z level of theory, and each additional 
CH; group increases the CV contribution by ~1.2 kcal mol! up to a con- 
tribution of 6.0 kcal mol” | for n-pentane. The nearly constant increase in 
the magnitude of the CV correction with the number of CH3 groups in 
linear alkanes has been previously noted by Dixon and co-workers, where 
a slightly lower increase of ~1.1 kcal mol! per CH3 group was obtained at 
the CCSD(T)/PWCVTZ level of theory (143). 

For the series (CH), hydrocarbon cages (Fig. 5B), we still obtain an 
almost perfect linear correlation (R°=0.9976) between the magnitude 
of the CV correction and the number of carbons in the (CH), cage. 
However, it is important to note that this is not a true homologous 
series. Namely, this series is composed of three platonic hydrocarbons 
(tetrahedrane (CH)4, cubane (CH), and dodecahedrane (CH)29), two 
prismatic hydrocarbons (triprismane (CH), and pentaprismane (CH)jo), 
and one truncated tetrahedrane (octahedrane (CH) 2). We note that the 
W1-F12 CV correction for tetrahedrane (5.09 kcal mol~') (129) is identical 
to that obtained at the W4 level (5.08 kcal mol” ') (36). The CV correction 
for triprismane (7.0 kcal mol~') is on the same order of magnitude as that 


Quantum mechanical thermochemical predictions 151 


for the benzene isomer (7.4 kcal mol‘). Similarly to the case of the linear 
alkanes, with each addition of a CH unit, the CV correction increases 
by roughly 1.2 kcal mol', up to a contribution of 23.0 kcal mol”! for 
dodecahedrane (CH). 


3.3 Relativistic corrections 


Table 6 gives an overview the magnitude of the scalar relativistic effects in 
the W4-17 database. All the scalar relativistic corrections in the W4-17 
database are calculated using the second-order Douglas—Kroll-Hess 
(DKH) approximation (144,145), which has been shown to yield results 
in close agreement with the full relativistic treatment for first- and 
second-row systems (146,147). The scalar relativistic contributions in the 
W4-17 database are calculated at the CCSD(T)/AVQZ-DK level of theory. 
The scalar relativistic corrections to the TAEs are universally repulsive. 
Relatively small contributions below 1 kJ mol! are obtained for diatomic 
molecules (e.g., C2, N2, O2, Fo, Cl, CO, NO, FO, and ClO) and small 
hydrides (e.g., BH, BH, BsH,, CH, CH>, CH3, CH4, NH, NHb, and 
NH3), but also for oxygen-rich species like ozone and halogen oxides 
(e.g., FO, CLO, FO2, ClOz, and F,O02). The largest contributions in 
the W4-17 database are obtained for polyhalogenated compounds with 
a central second-row atom, for example —3.2 (SF.), —2.7 (HCIO,), 
—2.6 (PF;), —1.9 (SOs, SiF,), —1.6 (CIO;), —1.3 (HCIO;, AIF, AICI), 
and — 1.0 (PFs) kcal mol '; as well as for fluorocarbon and chlorocarbon 
compounds —1.3 (C Fęẹ) and —1.1 (CsCle, CaCl, CoFy, cis/trans- 
C5FsCl,) kcal mol‘. Fig. 6 gives an overview of the scalar relativistic 
correction for the series of n-alkanes (up to n-pentane) and hydrocarbon 
cages (up to dodecahedrane). As is the case for the CV correction 
(Fig. 5), for both series, there is a near-perfect linear correlation between 
the scalar relativistic correction and the number of carbons in the system. 
The squared correlation coefficient is 0.9999 for the alkanes and 0.9995 
for the hydrocarbon cages (Fig. 6). For methane, the alkanes relativistic cor- 
rections range between —0.2 (methane) and —1.0 (n-pentane) kcal mol ', 
whereas for the (CH),, cages, they range between —0.8 (tetrahedrane) and 
—3.7 (dodecahedrane) kcal mol~'. Again, it should be noted that the 
nearly constant increase in the magnitude of the relativistic correction 
with the number of CH, groups in linear alkanes has been previously noted 
by Dixon and co-workers (143). 
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Fig. 6 Overview of scalar relativistic and first-order spin-orbit contributions to TAEs 
(in kcal mol!) for a series of (A) linear alkanes (methane, ethane, propane, n-butane, 
n-pentane) from W4 theory; and (B) platonic/prismatic (CH), hydrocarbon cages 
(tetrahedrane (CH),, triprismane (CH)¢, cubane (CH)g, pentaprismane (CH)io, 
octahedrane (CH);2, and dodecahedrane (CH)29) from W1-F12 theory. 


With the exception of composite ab initio methods developed specifi- 
cally for treating heavy main-group, transition-metal, and f-block systems, 
most methods consider only first-order atomic and molecular spin-orbit 
corrections. These corrections are nonzero for radicals in a degenerate 
ground state and can make nontrivial contributions to the molecular bind- 
ing energies (Table 6). The atomic spin-orbit correction for first- and 
second-row elements amount to 0.029 (B), 0.085 (C), 0.223 (O), 0.385 
(F), 0.214 (Al), 0.428 (Si), 0.560 (S), and 0.841 (Cl) kcal mol”! (33). 
Thus, the largest spin-orbit corrections in the W4—17 database are obtained 
for compounds with multiple oxygen, fluorine, and second-row atoms. 
Prominent examples with spin-orbit correction in excess of 1 kcal mol! 
include CsCl, (—5.2), CCl, (—3.5), SF, (—2.9), CIF, (—2.8), 
AlCl, (07), CəFe (—2.5), S4 (—2.2), CLO» (—2.1), Sir: (—2.0), 
Cl (—1.7), CF4 (—1.6), CIO, (—1.5), SO3 (—1.2), and Sə (—1.1 kcal mol” ^. 
For closed-shell hydrocarbons, the first-order spin-orbit correction to the 
TAE increases more slowly with the molecular size; however, it clearly 
cannot be neglected at the chemical accuracy level. For example, it amounts 
to just over 0.5 kcal mol! for benzene, and just over 1 kcal mol`! for 
octahedrane. Molecular spin-orbit corrections can have nontrivial contribu- 
tions even for first- and second-row radicals and have to be taken into 
account for quantitative chemical accuracy. For example, they amount to 
0.11 (CF), 0.18 (NO), 0.20 (OH, SiH), 0.23 (SiF), 0.28 (OF), 0.45 
(CIO), and 0.54 (HS) kcal mol™! (33). 
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3.4 The DBOC correction 


Deviations from the Born—Oppenheimer approximation affect the total 
atomization energies at the sub-kcal mol~' level even for small and 
medium-sized hydrocarbons. For example, for our series of hydrocarbon 
cages, the diagonal Born—Oppenheimer corrections (DBOCs) are 0.23 
(tetrahedrane), 0.28 (triprismane), 0.33 (cubane), 0.41 (pentaprismane), 
0.54 (octahedrane), 0.78 (dodecahedrane) kcal mol~! (129). These values 
are calculated at the HF/VTZ level of theory. However, it should be 
pointed out that for systems with many hydrogens, correlation contributions 
to the DBOC can reduce the HF DBOC contribution by up to 50% 
(34,48, 106, 128, 132,133, 148-150). For example, for the hydrocarbon 
cages, correlation contributions calculated at the ACCSD/VDZ level 
reduce the DBOC by —0.08 (tetrahedrane), —0.11 (triprismane), —0.15 
(cubane), —0.18 (pentaprismane), —0.22 (octahedrane), and — 0.36 (dode- 
cahedrane) kcal mol! (129). This point is further illustrated by examining 
a larger set of polycyclic aromatic hydrocarbons (PAHs) with up to 
18 carbon atoms (133). The considered set of 20 PAHs includes a diverse 
range of systems such as benzene, indene, naphthalene, biphenylene, biphe- 
nyl, anthracene, pyracene, pyrene, chrysene, and terphenyl. Fig. 7 plots 
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Fig. 7 Overview of the DBOC contribution to the TAEs calculated at the HF (blue line) 
and CCSD (orange line) levels for a series of 20 PAHs (in kcal mol~'). A number of rep- 
resentative PAHs are shown in the figure (see Ref. (729) for further details). The HF DBOC 
contribution scaled by 0.5 (green line) is in excellent agreement with the CCSD 
DBOC contribution (see text). 
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the DBOC to the TAEs calculated at the HF/VTZ and CCSD levels (where 
CCSD = HF/VTZ+ACCSD and ACCSD=CCSD/VDZ — HF/VDZ). 
The ACCSD correlation contribution reduces the HF/VTZ DBOC by 
amounts ranging from 41% (pyracyclene) to 49% (benzene). At the 
Hartree-Fock level, the DBOC contributions to the TAEs range between 
0.23 (benzene) and 0.64 (terphenyl) kcal mol’. However, at the CCSD 
level, the DBOC contributions to the TAEs range between 0.12 (benzene) 
and 0.36 (terphenyl) kcal mol” '. 

The above discussion illustrates that DBOC contributions clearly have to 
be considered in methods in the upper two rungs of Jacob’s Ladder (e.g., 
HEAT, Wn, and Wn-F12 (n=3, 4)). Since DBOC calculations are not 
computationally demanding (at least at the HF level), they are sometimes 
considered in methods on the second rung of Jacob’s Ladder (e.g., 
Wn-F12, n=1, 2). However, we recommend that DBOC contributions 
calculated at the HF level should be scaled by a factor of 0.5. For the 
above sets of PAHs and (CH), hydrocarbon cages, scaling the HF DBOC 
contribution by 0.5 results in an RMSD of merely 0.03 kcal mol! relative 
to the CCSD DBOC contribution (cf. an RMSD of 0.20 kcal mol! for the 
unscaled HF DBOC contribution). 

Fig. 7 also shows that there is a linear correlation between the DBOC 
values and the number of electrons in the PAHs. In particular, we obtain 
squared correlation coefficients of R°=0.983 and 0.990 at the HF and 
CCSD levels, respectively. We note that a perfect linear correlation is 
not expected since the examined set of PAHs is not a homologous 
series. Namely, it includes nonaromatic 4- and 5-membered rings (e.g., 
biphenylene and fluorene) as well as aromatic rings connected via C—C 
and —CHp>— linkers (e.g., biphenyl, diphenylmethane, and terphenyl) 
(see Ref. (129) for further details). 


4. Overview of accuracy and concluding remarks 


We conclude this chapter with an overview of the accuracy of several 
composite ab initio methods from each rung of the composite correlated 
molecular orbital theory Jacob’s Ladder. Fig. 8A gives the RMSDs and 
95% confidence intervals for hybrid CCSD(T)/MPn methods (rung 1) 
and pure CCSD(T) methods (rung 2) for the set of 183 TAEs for 
non-multireference first- and second-row species in the W4—-17 database 
(103). For most of these systems, the reference TAEs are calculated at the 
CCSDTQ5/CEBS level of theory (from W4 and W4.2 theories). For a subset 
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Fig. 8 (A) Overview of the error statistics for total atomization energies (TAEs) for com- 
posite ab initio methods across the rungs of Jacob's Ladder of composite correlated 
molecular orbital theory. The performance of CCSD(T)-based methods (rungs 1 and 2) 
is evaluated relative to the W4—17 database. The performance of post-CCSD(T) methods 
(rungs 3 and 4) is evaluated relative to a smaller set of highly accurate experimental TAEs 
from ATcT (see Ref. (76) for further details). Both 95% confidence intervals (95% Cls) and 
root-mean-square deviations (RMSDs) are given in kcal mol~'. (B) Enlarged view of 
performance for post-CCSD(T) methods (rungs 3 and 4). 


of small molecules (e.g., C2H2, HCN, SiHy, FO, and Cl), the reference 
TAEs are calculated at the CCSDTQ56/CBS level of theory (from W4.3 
and W4.4 theories). For a subset of larger molecules (e.g., CsH 12, C6H6, 
CHCOOH, HC1Ou,, and SF,), the reference TAEs are calculated at the 
CCSDT(Q)/CBS level of theory (from W4lite theory). The TAEs in the 
W4+17 dataset are associated with a 36 confidence interval of 1 kJ mol! 
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and, therefore, should be sufficiently accurate for benchmarking the 
performance of CCSD(T)-based composite ab initio methods. In terms 
of chemical diversity, the W4-17 database includes both organic and 
inorganic species involving single and multiple bonds with varying 
degrees of covalent and ionic characters. The organic systems include 
hydrocarbon and halogenated alkanes/alkenes/alkynes, arenes, aromatic 
heterocycles, non-aromatic heterocycles, alcohols, aldehydes, ketones, 
anhydrides, carboxylic acids, amines, imines, and nitriles. The inorganic 
species include halogenated species, boranes, oxides, acids, hydrides, and pure 
atomic clusters. The set of species used for evaluating the CCSD(T)-based 
methods spans the gamut from systems dominated by a single reference 
configuration (e.g., CH4, CHOH, CH3NH)2) to systems that exhibit 
appreciable non-dynamical correlation effects (e.g., O2, S03, N2O4), how- 
ever, it excludes systems exhibiting pathological non-dynamical correlation 
effects (e.g., C2, O3, F202). 

Fig. 8 depicts both the RMSDs and 95% CIs to illustrate the significant 
difference in establishing chemical and benchmark accuracies using these 
two statistical metrics. We start by noting that, as expected, there is a clear 
improvement in performance along the rungs of Jacob’s Ladder of compos- 
ite ab initio methods (Fig. 8A). Apart from G4 and ccCA-PS3 theories, 
the considered hybrid CCSD(T)/MPn methods (rung 1) attain RMSDs 
larger than 1 kcal mol! for TAEs. For three of the methods (CBS-QB3, 
G3(MP2)B3, and ROCBS-QB3), the RMSDs are larger than 2 kcal mol’. 
However, we note that this level of accuracy is still much better than 
that obtained for computationally demanding MP2-based and DHDFT 
methods in conjunction with a quadruple-¢ basis set. For example, the 
following RMSDs are obtained for a number of representative methods 
2.3 (B2GP-PLYP), 2.6 (@B97X-2 (TQZ)), 2.8 (PWPB95), 3.4 (B2-PLYP), 
3.6 (SCS-MP3), and 8.1 (SCS-MP2) kcal mol~' (103). The G3B3, 
G4(MP2)-6X, and G4(MP2) methods attain RMSDs between 1 and 
2 kcal mol~', and the computationally more demanding G4 and ccCA- 
PS3 methods attain RMSDs just below 1 kcal mol. 

The pure CCSD(T)-based methods from the second rung of Jacob’s 
Ladder attain RMSDs that are well below the 1 kcal mol! mark. For exam- 
ple, the following RMSDs are obtained for representative methods 0.72 
(W1-F12), 0.63 (W2X), and 0.55 (W2-F12) kcal mol~'. Nevertheless, 
these RMSDs still translate to 95% CIs that are above the 1 kcal mol ' 
chemical accuracy threshold (Fig. 8A). 
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There is a clear drop in the RMSDs and 95% CIs when moving from 
the CCSD(T)/CBS methods (rung 2) to the CCSDT(Q)/CBS methods 
(rung 3) (Fig. 8A). It is important to stress that the theoretical TAEs 
from the W4-17 database can no longer be used for assessing the post- 
CCSD(T) composite ab initio methods on rungs 3 and 4. Here, these 
methods are assessed against highly accurate experimental data from 
the Active Thermochemical Tables. Fig. 8 gives the RMSDs and 95% 
Cls for the third-rung methods against a set of 18 first-row ATcT TAEs 
associated with error bars <0.06 kcal mol~'. This set comprising only 
first-row systems is used here so that the HEAT and Wn methods could 
be compared on an even keel. However, we note that similar error statistics 
are obtained for the Wn methods against a larger set of first- and 
second-row ATcT atomization energies associated with error bars 
<0.05 kcal mol™! (see Ref. (78) for further details). The RMSDs for the 
third-rung methods are 0.168 (W3-F12), 0.149 (W3.2), 0.101 (HEAT- 
456(Q)), 0.090 (W4lite), and 0.083 (HEAT-345(Q)) kcal mol~'. These 
RMSDs translate to 95% CIs ranging between 0.337 (W3-F12) and 
0.166 (HEAT-345(Q)) kcal mol~'. Thus, as shown in Fig. 8B, nearly all 
the third rung methods attain benchmark accuracy in terms of the 95% 
CIs, and all of them attain benchmark accuracy in terms of the RMSDs. 

Finally, let us move to the post-CCSDT(Q)/CBS methods on the 
fourth rung. These methods attain RMSDs<0.1 kcal mol! and 95% 
CIs <0.2 kcal mol~!. For example, we obtain the following RMSDs 
0.100 (HEAT-456QP), 0.072 (W4), 0.068 (HEAT-345QP), and 0.060 
(W4.x) kcal mol~', which translate to 95% CIs of 0.200 (HEAT-456QP), 
0.144 (W4), 0.135 (HEAT-345QP), and 0.120 (W4.x) kcal mol. 

Finally, it should be emphasized that the improvement in performance 
along the rungs of Jacob’s Ladder comes with a significant increase in com- 
putational cost. For example, methods from the first rung, such as G4(MP2), 
have been applied to systems as large as Ceo (125), and methods from the 
second rung have been applied to systems as large as the C24 carbon clusters 
(130), dodecahedrane (CH) 29 (129), and PAHs with up to 18 carbons (133). 
Methods from the third rung can generally be applied to systems with up 
to ~10 non-hydrogen atoms with current mainstream technology. For 
example, W3lite theory has been applied to barbaralane (CoHjo) (141) 
and phosphorus sulfide isomers (P4S4) (135); and W4lite theory has been 
applied to C2X6 (X =F, Cl) (103), the SFg anion (108), cyclic Sg (134), 
and benzene (36). The largest systems methods from the fourth rung have 
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been applied to normally include highly symmetric species with up to five 
non-hydrogen atoms. For example, W4 theory has been applied to CCly, 
SiF4, cyanogen (CN)s, and tetrahedrane (CH),4 (80, 103). 
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